Electronic device and method for controlling electronic device

ABSTRACT

An electronic device is provided that includes a memory, a communication unit configured to receive a video image streamed from a first electronic device, a display unit configured to display the video image, and a control unit configured to identify a specific area included in the video image, to store a first image displayed on the specific area at a first time point in the memory, and to store in the memory a second image displayed on the specific area at a second time point when a variation of an image is equal to or more than a predetermined threshold.

CROSS-REFERENCE TO RELATED APPLICATION

This application claims priority to Korean Patent Application No. ______ filed on ______, the contents of which are herein incorporated by reference in its entirety.

BACKGROUND

1. Technical Field

Embodiments of the present invention are directed to an electronic device and a method of operating the electronic device, and more specifically to an electronic device that may be used for a videoconference and a method of controlling the electronic device.

2. Discussion of the Related Art

Tele-presence refers to a set of technologies which allow a person to feel as if they were present. Tele-presence technologies reproduce information on five senses a person feels in a specific space at a remote location. Element technologies for tele-presence may include video, audio, tactile, and network transmission technologies. Such tele-presence technologies are adopted for video conference systems. Tele-presence-based video conference systems provide higher-quality communications and allow users to further concentrate on the conversation compared to conventional video conference systems.

The tele-presence technologies for video conference systems, although showing a little difference for each and every manufacturer, may be applicable to video, audio, and network transmission technologies as follows:

For video technologies, the tele-presence technologies apply as generating natural eye-contact images for being able to make a user further feel like he would face another user and generating high-resolution images. For audio technologies, the tele-presence technologies apply as audio playback technologies that may create a feeling of a space based on a speaker's location. For network transmission technologies, the tele-presence technologies apply as real-time image/sound transmission technologies based on an MCU (Multi Control Unit).

In contrast to video, audio, and network transmission for video conference systems which have been actively researched, data sharing between attendants in a conference is still not satisfactory. Current video conference systems use a separate monitor for data sharing. Accordingly, when a user shifts his eyes from an image screen to a data screen, the eye contact is not maintained lowering a feeling as if actually facing another user. Moreover, a short drop in conversation occurs at every data manipulation because the data manipulation is conducted by a peripheral device, such as a mouse.

SUMMARY

Embodiments of the present invention provide an electronic device and a method of operating the electronic device, which may allow for a vivid video conference.

According to an embodiment of the present invention, there is provided an electronic device including a memory, a communication unit configured to receive a video image streamed from a first electronic device, a display unit configured to display the video image, and a control unit configured to identify a specific area included in the video image, to store a first image displayed on the specific area at a first time point in the memory, and to store in the memory a second image displayed on the specific area at a second time point when a variation of an image is equal to or more than a predetermined threshold.

The control unit is configured to store the first image so that the first image corresponds to the first time point and to store the second image so that the second image corresponds to the second time point.

The first and second images are still images.

The control unit is configured to determine the variation of the image displayed on the specific area based on a variation between the image displayed on the specific area and the first image stored in the memory.

The control unit is configured to analyze the video image to identify the specific area.

The control unit is configured to receive information on the specific area from the first electronic device through the communication unit and to identify the specific area based on the received information.

According to an embodiment of the present invention, there is provided to an electronic device including a memory, a communication unit configured to receive a video image streamed from a first electronic device, a display unit configured to display the video image, and a control unit configured to identify a specific area included in the video image, to store in the memory a still image reflecting a content displayed on the specific area whenever the content changes, so that the still image corresponds to a time point when the content changes, to determine a time point corresponding to a predetermined request when the request is received, and to call a still image corresponding to the determined time point from the memory.

The control unit is configured to display both the streamed video image and the called still image on the display unit.

The control unit is configured to display the still image on a second area of the video image, the second area not overlapping the specific area.

The controller is configured to replace an image displayed on the specific area of the streamed video image by the still image and to display the replaced still image on the display unit.

According to an embodiment of the present invention, there is provided an electronic device including a memory, a communication unit configured to receive at least one multimedia data clip from at least one second electronic device, a display unit configured to display the at least one multimedia data clip, and a control unit configured to identify a first speaker corresponding to audio data included in the at least one multimedia data clip, to obtain information corresponding to the identified first speaker, and to store the obtained information so that the obtained information corresponds to a first time point for the at least one multimedia data clip.

The first time point is when the first speaker begins to speak.

The control unit is configured to analyze audio data included in the at least one multimedia data clip and to determine that the first time point is when a human voice included in the audio data is sensed.

The control unit is configured to analyze video data included in the at least one multimedia data clip to identify the first speaker.

The control unit is configured to identify the first speaker based on a lip motion included in the video data.

Information relating to the first speaker includes at least one of personal information on the first speaker, information on a place where the first speaker is positioned, and a keyword which the first speaker speaks.

According to an embodiment of the present invention, there is provided an electronic device including a communication unit configured to receive at least one multimedia data clip streamed from at least one second electronic device, a memory configured to store the at least one multimedia data clip, a display unit configured to display video data included in the at least one multimedia data clip, and a control unit configured to, whenever a speaker corresponding to audio data included in the at least one multimedia data clip changes, store information corresponding to the speaker so that the information corresponds to a time point when the speaker changes, to determine a time point corresponding to a predetermined input when the predetermined input is received, and to call at least part of a multimedia data clip corresponding to the determined time point from the memory.

The control unit is configured to display both the video data included in the streamed at least one multimedia data clip and video data included in the called at least part of the multimedia data clip.

The electronic device further includes a sound output unit, wherein the control unit is configured to output through the sound output unit at least one of audio data included in the streamed at least one multimedia data clip and audio data included in the called at least part of the multimedia data clip.

The control unit is configured to display both the video data included in the streamed at least one multimedia data clip and text data corresponding to audio data included in the called at least part of the multimedia data clip.

The embodiments of the present invention may provide the following effects.

First, the second user who attends the video conference at the second place may store the image for the presentation material without separately receiving data for the presentation material provided for performing the conference by the first user who conducts the video conference at the first place. The image for the presentation material may be separately extracted from the video image provided through the video conference and stored through the electronic device used by the second user without any annoying process such as previously receiving separate electronic data (for example, electronic files) for the presentation material used for the video conference by the first user.

Second, in the case that the conference is performed with materials difficult to convert into data (for example, samples used for introducing a prototype model) or the first user has not converted presentation material used for the conference into electronic data in advance, according to an embodiment, the presentation material may be used for the conference while converted into image data at the same time, so that the second user may see again the presentation material used for the video conference.

Third, the second user may review the previous pages of the presentation material used for the video conference hosted by the first user while the video conference is in progress, thereby enabling more efficient video conference.

Fourth, the electronic device may continue to monitor the speakers during the course of the video conference and may store various types of information on the speakers so that the information corresponds to the time points when the speakers begin to speak, thereby generating metadata for video conference. Further, the video conference metadata is used in various manners, thus enhancing user convenience. For example, the video conference metadata may be used to make brief proceedings for the video conference, which are to be provided to the attendees of the conference or may be used to provide a search function which allows the attendees to review the conference.

Fifth, by identifying the speaker and outputting the multimedia data clip corresponding to the speaker in a different manner than those for the other multimedia data clips, more attention can be oriented toward the user who is making a speech in the video conference, thereby enabling the video conference to proceed more efficiently.

Finally, after or while the video conference ends, specific time points of the multimedia data for the video conference may be searched to review the video conference, and the multimedia data corresponding to the searched time points may be output.

BRIEF DESCRIPTION OF THE DRAWINGS

The embodiments of the present invention will become readily apparent by reference to the following detailed description when considered in conjunction with the accompanying drawings wherein:

FIG. 1 is a block diagram illustrating an electronic device according to an embodiment of the present invention;

FIG. 2 is a view illustrating an example where a user inputs a gesture to an electronic device as shown in FIG. 1;

FIG. 3 is a view for describing an environment according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating a method of controlling an electronic device according to an embodiment of the present invention;

FIG. 5 is a view illustrating a video image displayed by an electronic device according to an embodiment of the present invention;

FIG. 6 is a view illustrating a screen viewed after the page of the presentation material displayed on the specific area has changed according to an embodiment of the present invention;

FIG. 7 is a view illustrating an example of storing a plurality of images displayed on a specific area according to an embodiment of the present invention;

FIGS. 8 to 10 are views illustrating methods of obtaining a predetermined request according to embodiments of the present invention;

FIGS. 11 and 12 are views illustrating exemplary methods of displaying an obtained image according to embodiments of the present invention;

FIG. 13 is a view schematically illustrating an environment to which an embodiment of the present invention applies;

FIG. 14 is a flowchart illustrating a method of controlling an electronic device according to an embodiment of the present invention;

FIGS. 15 to 19 are views illustrating embodiments of the present invention;

FIG. 20 is a view illustrating an example of video conference metadata stored according to an embodiment of the present invention;

FIG. 21 is a flowchart illustrating a method of controlling an electronic device according to an embodiment of the present invention;

FIG. 22 illustrates examples of receiving the predetermined input by various methods according to an embodiment of the present invention; and

FIGS. 23 to 25 are views illustrating examples of outputting the current and past multimedia data according to an embodiment of the present invention.

DESCRIPTION OF THE EMBODIMENTS

The present invention will now be described more fully with reference to the accompanying drawings, in which exemplary embodiments of the invention are shown. The invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein; rather, there embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the invention to those skilled in the art.

Hereinafter, a mobile terminal relating to the present invention will be described below in more detail with reference to the accompanying drawings. In the following description, suffixes “module” and “unit” are given to components of the mobile terminal in consideration of only facilitation of description and do not have meanings or functions discriminated from each other.

FIG. 1 is a block diagram illustrating an electronic device according to an embodiment of the present invention.

Referring to FIG. 1, the electronic device 100 includes a communication unit 110, a user input unit 120, an output unit 150, a memory 160, an interface unit 170, a control unit 180, and a power supply unit 190. The components shown in FIG. 1 may be components that may be commonly included in an electronic device. Accordingly, more or less components may be included in the electronic device 100.

The communication unit 110 may include one or more modules that enable communication between the electronic device 100 and a communication system or between the electronic device 100 and another device. For instance, the communication unit 110 may include a broadcast receiving unit 111, an Internet module 113, and a near-field communication module 114.

The broadcast receiving unit 111 receives broadcast signals and/or broadcast-related information from an external broadcast managing server through a broadcast channel.

The broadcast channel may include a satellite channel and a terrestrial channel The broadcast managing server may refer to a server that generates broadcast signals and/or broadcast-related information and broadcasts the signals and/or information or a server that receives pre-generated broadcast signals and/or broadcast-related information and broadcasts the signals and/or information to a terminal. The broadcast signals may include TV broadcast signals, radio broadcast signals, data broadcast signals as well as combinations of TV broadcast signals or radio broadcast signals and data broadcast signals.

The broadcast-related information may refer to information relating to broadcast channels, broadcast programs, or broadcast service providers. The broadcast-related information may be provided through a communication network.

The broadcast-related information may exist in various forms, such as, for example, EPGs (Electronic Program Guides) of DMB (Digital Multimedia Broadcasting) or ESGs (Electronic Service Guides) of DVB-H (Digital Video Broadcast-Handheld).

The broadcast receiving unit 111 may receive broadcast signals using various broadcast systems. Broadcast signals and/or broadcast-related information received through the broadcast receiving unit 111 may be stored in the memory 160.

The Internet module 113 may refer to a module for access to the Internet. The Internet module 113 may be provided inside or outside the electronic device 100.

The near-field communication module 114 refers to a module for near-field communication. Near-field communication technologies may include Bluetooth, RFID (Radio Frequency Identification), IrDA (Infrared Data Association), UWB (Ultra Wideband), and ZigBee technologies.

The user input unit 120 is provided for a user's entry of audio or video signals and may include a camera 121 and a microphone 122.

The camera 121 processes image frames including still images or videos as obtained by an image sensor in a video call mode or image capturing mode. The processed image frames may be displayed by the display unit 151. The camera 121 may perform 2D or 3D image capturing or may be configured as one or a combination of 2D and 3D cameras.

The image frames processed by the camera 121 may be stored in the memory 160 or may be transmitted to an outside device through the communication unit 110. According to an embodiment, two or more cameras 121 may be included in the electronic device 100.

The microphone 122 receives external sound signals in a call mode, recording mode, or voice recognition mode and processes the received signals as electrical voice data. The microphone 122 may perform various noise cancelling algorithms to remove noises created when receiving the external sound signals. A user may input various voice commands through the microphone 122 to the electronic device 100 to drive the electronic device 100 and to perform functions of the electronic device 100.

The output unit 150 may include a display unit 151 and a sound output unit 152.

The display unit 151 displays information processed by the electronic device 100. For example, the display unit 151 displays a UI (User Interface) or GUI (Graphic User Interface) associated with the electronic device 100. The display unit 151 may be at least one of a liquid crystal display, a thin film transistor liquid crystal display, an organic light emitting diode display, a flexible display, and a 3D display. The display unit 151 may be configured in a transparent or light transmissive type, which may be called a “transparent display” examples of which include transparent LCDs. The display unit 151 may have a light-transmissive rear structure in which a user may view an object positioned behind the terminal body through an area occupied by the display unit 151 in the terminal body.

According to an embodiment, two or more display units 151 may be included in the electronic device 100. For instance, the electronic device 100 may include a plurality of display units 151 that are integrally or separately arranged on a surface of the electronic device 100 or on respective different surfaces of the electronic device 100.

When the display unit 151 and a sensor sensing a touch (hereinafter, referred to as a “touch sensor”) are layered (this layered structure is hereinafter referred to as a “touch sensor”), the display unit 151 may be used as an input device as well as an output device. The touch sensor may include, for example, a touch film, a touch sheet, or a touch pad.

The touch sensor may be configured to convert a change in pressure or capacitance, which occurs at a certain area of the display unit 151, into an electrical input signal. The touch sensor may be configured to detect the pressure exerted during a touch as well as the position or area of the touch.

Upon touch on the touch sensor, a corresponding signal is transferred to a touch controller. The touch controller processes the signal to generate corresponding data and transmits the data to the control unit 180. By doing so, the control unit 180 may recognize the area of the display unit 151 where the touch occurred.

The sound output unit 152 may output audio data received from the communication unit 110 or stored in the memory 160. The sound output unit 152 may output sound signals associated with functions (e.g., call signal receipt sound, message receipt sound, etc.) performed by the electronic device 100. The sound output unit 152 may include a receiver, a speaker, and a buzzer.

The memory 160 may store a program for operation of the control unit 180, and may preliminarily store input/output data (for instance, phone books, messages, still images, videos, etc.). The memory 160 may store data relating to vibrations and sounds having various patterns, which are output when the touch screen is touched.

The memory 160 may include at least one storage medium of flash memory types, hard disk types, multimedia card micro types, card type memories (e.g., SD or XD memories), RAMs (Random Access Memories), SRAM (Static Random Access Memories), ROMs (Read-Only Memories), EEPROMs (Electrically Erasable Programmable Read-Only Memories), PROM (Programmable Read-Only Memories), magnetic memories, magnetic discs, and optical discs. The electronic device 100 may operate in association with a web storage performing a storage function of the memory 160 over the Internet.

The interface unit 170 functions as a path between the electronic device 100 and any external device connected to the electronic device 100. The interface unit 170 receives data or power from an external device and transfers the data or power to each component of the electronic device 100 or enables data to be transferred from the electronic device 100 to the external device. For instance, the interface unit 170 may include a wired/wireless headset port, an external recharger port, a wired/wireless data port, a memory card port, a port connecting a device having an identification module, an audio I/O (Input/Output) port, a video I/O port, and an earphone port.

The control unit 180 controls the overall operation of the electronic device 100. For example, the control unit 180 performs control and processes associated with voice call, data communication, and video call. The control unit 180 may include an image processing unit 182 for image process. The image processing unit 182 is described below in relevant parts in greater detail.

The power supply unit 190 receives internal or external power under control of the control unit 180 and supplies the power to each component for operation of the component.

The embodiments described herein may be implemented in software or hardware or in a combination thereof, or in a recording medium readable by a computer or a similar device to the computer. When implemented in hardware, the embodiments may use at least one of ASICs (application specific integrated circuits), DSPs (digital signal processors), DSPDs (digital signal processing devices), PLDs (programmable logic devices), FPGAs (field programmable gate arrays, processors, controllers, micro-controllers, microprocessors, and electrical units for performing functions. According to an embodiment, the embodiments may be implemented by the control unit 180.

When implemented in software, some embodiments, such as procedures or functions, may entail a separate software module for enabling at least one function or operation. Software codes may be implemented by a software application written in proper programming language. The software codes may be stored in the memory 160 and may be executed by the control unit 180.

FIG. 2 is a view illustrating an example where a user inputs a gesture to an electronic device as shown in FIG. 1.

Referring to FIG. 2, the electronic device 100 may capture the gesture of the user U and may perform a proper function corresponding to the gesture.

The electronic device 100 may be any electronic device having the display unit 151 that can display images. The electronic device 100 may be a stationary terminal, such as a TV shown in FIG. 2, which is bulky and thus placed in a fixed position, or may be a mobile terminal such as a cell phone. The electronic device 100 may include the camera 121 that may capture the gesture of the user U.

The camera 121 may be an optical electronic device that performs image capturing in a front direction of the electronic device 100. The camera 121 may be a 2D camera for 2D image capturing and/or a 3D camera for 3D image capturing. Although in FIG. 2 one camera 121 is provided at a top central portion of the electronic device 100 for ease of description, the number, location, and type of the camera 121 may vary as necessary.

The control unit 180 may trace a user U having a control right when discovering the user U. The issue and trace of the control right may be performed based on an image captured by the camera 121. For example, the control unit 180 may analyze a captured image and continuously determine whether there a specific user U exists, whether the specific user U performs a gesture necessary for obtaining the control right, and whether the specific user U moves or not.

The control unit 180 may analyze a gesture of a user having the control right based on a captured image. For example, when the user U makes a predetermined gesture but does not own the control right, no function may be conducted. However, when the user U has the control right, a predetermined function corresponding to the predetermined gesture may be conducted.

The gesture of the user U may include various operations using his/her body. For example, the gesture may include the operation of the user sitting down, standing up, running, or even moving. Further, the gesture may include operations using the user's head, foot, or hand H. For convenience of illustration, a gesture of using the hand H of the user U is described below as an example. However, the embodiments of the present invention are not limited thereto.

According to an embodiment, analysis of a hand gesture may be conducted in the following ways.

First, the user's fingertips are detected, the number and shape of the fingertips are analyzed, and then converted into a gesture command.

The detection of the fingertips may be performed in two steps.

First, a step of detecting a hand area may be performed using a skin tone of a human. A group of candidates for the hand area is designated and contours of the candidates are extracted based on the human's skin tone. Among the candidates, a candidate the contour of which has the same number of points as a value in a predetermined range may be selected as the hand.

Secondly, as a step of determining the fingertips, the contour of the candidate selected as the hand is run around and a curvature is calculated based on inner products between adjacent points. Since the fingertips show sharp variation of their curvatures, when a change in a curvature of a fingertip exceeds a threshold value, the fingertip is chosen as a fingertip of the hand. The fingertips thusly extracted may be converted into meaningful commands during gesture-command conversion.

According to an embodiment, it is often necessary with respect to a gesture command for a synthesized virtual 3D image (3D object) to judge whether a contact has occurred between the virtual 3D image and a user's gesture. For example, it may be necessary, as is often case, whether there is a contact between an actual object and a virtual object to manipulate the virtual object interposed in the actual object.

Whether the contact is present or not may be determined by various collision detection algorithms. For instance, a rectangle bounding box method and a bounding sphere method may be adopted for such judgment.

The rectangle bounding box method compares areas of rectangles surrounding a 2D object for collision detection. The rectangle bounding box method has merits such as being less burden in calculation and easy to follow. The bounding sphere method determines whether there is collision or not by comparing radii of spheres surrounding a 3D object.

For example, a depth camera may be used for manipulation of a real hand and a virtual object. Depth information of the hand as obtained by the depth camera is converted into a distance unit for a virtual world for purposes of rendering of the virtual image, and collision with the virtual object may be detected based on a coordinate.

Hereinafter, an exemplary environment in which the embodiments of the present invention are implemented is described. FIG. 3 is a view for describing an environment according to an embodiment of the present invention.

Referring to FIG. 3, a first user U1 and a second user U2 are positioned in a first place and a second place, respectively. The first user U1 may be a person who hosts a video conference and/or provides lectures to a number of other people including the second user U2, and the second user U2 may be a person who attends the video conference hosted by the first user U1.

A voice and/or motion of the first user U1 may be obtained and converted into video data and/or audio data by an electronic device 200 arranged in the first place. Further, the video data and/or audio data may be transferred through a predetermined network (communication network) to another electronic device 300 positioned in the second place. The first electronic device 300 may output the transferred video data and/or audio data through an output unit in a visual or auditory manner. The first electronic device 300 and the first electronic device 300 each may be the same or substantially the same as the electronic device 100 described in connection with FIG. 1. However, according to an embodiment, each of the first electronic device 300 and the first electronic device 300 may include only some of the components of the electronic device 100. According to an embodiment, the components of the first electronic device 300 may be different from the components of the first electronic device 300.

FIG. 3 illustrates an example where the first electronic device 300 obtains and transfers the video data and/or audio data and the first electronic device 300 outputs the transferred video data and/or audio data. According to an embodiment, the first electronic device 300 and the first electronic device 300 may switch to each other in light of functions and operations, or alternatively, each of the first electronic device 300 and the first electronic device 300 may perform the whole functions described above.

For example, the first user U1 may transfer his image and/or voice through the first electronic device 300 to the first electronic device 300 and may receive and output an image and/or voice of the second user U2. Likewise, the first electronic device 300 may also perform the same functions and operations as the first electronic device 300.

Hereinafter, a method of controlling an electronic device according to an embodiment of the present invention is described. For purposes of illustration, the control method is performed by the electronic device 100 described in connection with FIG. 1. As used herein, the “first electronic device” refers to the electronic device 300 shown in FIG. 3, which is positioned in the second place, and the “second electronic device” refers to the electronic device 300 shown in FIG. 3, which is positioned in the first place. However, the embodiments of the present invention are not limited thereto.

FIG. 4 is a flowchart illustrating a method of controlling an electronic device according to an embodiment of the present invention.

Referring to FIG. 4, the control method of an electronic device may include a step of receiving a video image from a second electronic device 200 (S100), a step of displaying the video image on the display unit 151 (S110), a step of identifying a specific area of the video image (for example, an area where a presentation material necessary for performing a video conference is displayed) (S120), and a step of storing an image (e.g., first image) displayed on the specific area (S130).

The first electronic device 300 may further include a step of determining whether a variation of the image displayed on the specific area is equal to or larger than a predetermined threshold (S140) and a step of storing an image (e.g., second image) displayed on the specific area when the variation of the image is equal to or larger than the predetermined threshold (S150). When the variation of the image is smaller than the predetermined threshold, the first electronic device 300 may continue to monitor whether the variation of the image becomes equal to or larger than the threshold (S140).

The first electronic device 300 may continuously display video images received from the second electronic device 200 on the display unit 151. When the first electronic device 300 receives a predetermined request while performing steps 5100 to S150 (S160), the first electronic device 300 obtains an image corresponding to the request among images stored in the memory 160 in steps S130 and/or S150 (S170) and displays the obtained image on the display unit 151 (S180). Hereinafter, the steps are described in greater detail.

The first electronic device 300 positioned in the second place may receive a video image from the second electronic device 200 (S100). The video image may be streamed from the second electronic device 200 to the first electronic device 300.

The video image may be obtained by the second electronic device 200. For instance, the video image may include a scene relating to a video conference performed by the first user U1 or a scene relating to an online lecture conducted by the first user U1 as obtained by the second electronic device 200.

The video image may be a video image that is obtained by the camera 121 included in the second electronic device 200 and reflects a real situation. The video image may be a composite image of a virtual image and a video image reflecting a real situation. At least part of the video image reflecting the real situation may be replaced by another image.

The video image may be directly transmitted from the second electronic device 200 to the first electronic device 300 or may be transmitted from the second electronic device 200 to the first electronic device 300 via a server (not shown).

The first electronic device 300 may generate a control signal for visually representing a video image (S110). The first electronic device 300 may visually output the video image through the display unit 151 or a beam projector (not shown) according to the control signal.

The first electronic device 300 may identify a specific area included in the video image (S120). For instance, the first electronic device 300 may identify an area which displays a presentation material necessary for performing a video conference.

FIG. 5 is a view illustrating a video image displayed by an electronic device according to an embodiment of the present invention. Referring to FIG. 5, it can be seen that as shown in FIG. 3 a video image obtained by the second electronic device 200 for the first user U1 at the first place may be displayed through the first electronic device 300. A first image I1 for a presentation material (or lecture material, which is jointly referred to as “presentation material”) necessary for a video conference and/or lecture may be displayed on a specific area SA. The first image I1 may be an image reflecting an actual situation. Alternatively, the first image I1 may be a composite image made by the second electronic device 200.

The first electronic device 300 may identify the specific area SA on which the presentation material is displayed as described above. The first electronic device 300 may employ various methods to identify the specific area SA. For example, the first electronic device 300 may use an image processing technology to analyze the video image and to identify an area on which marks such as letters and/or diagrams are intensively displayed, so that the specific area SA may be noticed. As another example, the second electronic device 200 may transmit location information of the specific area SA to the first electronic device 300 together with or separately from the video image upon transmission of the video image. The first electronic device 300 may identify the specific area SA based on the transmitted location information.

Subsequently, the first electronic device 300 may store the image displayed on the specific area (S130). For example, the first electronic device 300 may store the image for the presentation material included in the video image. The image may be stored in the memory 160.

The presentation material may include a number of pages or may include a video material. The image displayed on the specific area, which is stored in step S130, may be a still image for part of the presentation material displayed on the specific area at a particular time point. For example, in the case that the presentation material includes several pages, the image stored in step S130 (hereinafter, referred to as “a first image”) may be an image for a particular page that is displayed at a time point when step 120 and/or step S130 are performed among the pages included in the presentation material. In the case that the presentation material includes a video, the image stored in step S130 may be an image for a particular frame displayed at a time point when step S120 and/or step S130 are performed among a plurality of frames included in the movie (presentation material).

The second user who attends the video conference at the second place may store the image for the presentation material without separately receiving data for the presentation material provided for performing the conference by the first user who conducts the video conference at the first place. The image for the presentation material may be separately extracted from the video image provided through the video conference and stored through the electronic device used by the second user without any annoying process such as previously receiving separate electronic data (for example, electronic files) for the presentation material used for the video conference by the first user.

For example, in the case that the conference is performed with materials difficult to convert into data (for example, samples used for introducing a prototype model) or the first user has not converted presentation material used for the conference into electronic data in advance, according to an embodiment, the presentation material may be used for the conference while converted into image data at the same time, so that the second user may see again the presentation material used for the video conference.

The first electronic device 300 determines whether the variation of the image displayed on the specific area is equal to or more than a predetermined threshold (S140), and when the variation is determined to be not less than the threshold, the first electronic device 300 may store the image displayed on the specific area (S150). However, when the variation is less than the threshold, the first electronic device 300 may continue to monitor whether the variation is equal to or more than the threshold (S140). When the variation is less than the threshold, the first electronic device 300 may keep monitoring any change to the image without separately storing the image.

To perform step S140, the first electronic device 300 receives and displays the video image and continues to monitor the specific area. The first electronic device 300 may continuously perform step S140 and monitor whether there is any change to the presentation material displayed on the specific area (e.g., content displayed on the specific area).

For example, in the case that the first user U1 changes the presentation material from a first material to a second material while performing the conference at the first place, the first electronic device 300 may sense a change to the image displayed on the specific area (S140) and may store the image displayed on the specific area after such change separately from the first image stored in step S130 (S150).

As another example, in the case that the presentation material include a plurality of pages, when the first user U1 changes the presentation material from an Nth page to an N+1th page, the first electronic device 300 may sense a change to the image displayed on the specific area (S140) and may store the image displayed on the specific area after such change separately from the first image stored in step S130 (S150). FIG. 6 is a view illustrating a screen viewed after the page of the presentation material displayed on the specific area has changed according to an embodiment of the present invention. Referring to FIGS. 5 and 6, it can be seen that the image displayed on the specific area SA has changed from the first image I1 to the second image I2. According to an embodiment, the first electronic device 300 may store the second image I2 in the memory 160 separately from the first image I1.

For example, in the case that the presentation material is a video material, the first electronic device 300 may sense a change to the image displayed on the specific area (S140) and may store the image displayed on the specific area after such change separately from the first image stored in step S130 (S150). For example, an image corresponding to an Nth frame of the video material is stored as the first image, and an image corresponding to an N+ath frame in which a variation of the image corresponding to the Nth frame is equal to or more than a predetermined threshold may be stored in step S150 (where, a is an integer equal to or more than 1). For example, when a difference (variation) between the image corresponding o the Nth frame and images corresponding to the N+1th frame and the N+2th frame does not exceed the threshold, the first electronic device 300 does not sore the images corresponding to the N+1th and N+2th frames of the video material. However, when a change (variation) between the image corresponding to the Nth frame and an image corresponding to the N+3th frame of the video material is in excess of the threshold, the first electronic device 300 stores the image corresponding to the N+3th frame in step S150. Accordingly, even when the presentation material provided by the first user is a video material, an image corresponding to a frame positioned at a border where the image changes a lot may be stored in the first electronic device 300, so that the second user may review the presentation material later.

The first electronic device 300 may compare an image displayed in real time on the specific area with the first image stored in step S130 (or as described below, an image right coming right before the first image when steps S140 and S150 are repeated) and may yield a variation. For example, in the case that the presentation material is a video, the image currently displayed on the specific area is an image corresponding to the Nth frame, and the stored first image (or image stored immediately before the first image) is an image corresponding to the N−5th frame, the first electronic device 300 may compare the image corresponding to the Nth frame with the image corresponding to the N−5th frame rather than corresponding to the N−1th frame and may produce a variation.

Subsequently, the first electronic device 300 repeats steps S140 and S150 and stores the image displayed on the specific area SA in the memory 160 whenever the image changes by more than the threshold. The first electronic device 300 may store the plurality of images stored in steps S130 and/or S150 in order of storage.

FIG. 7 is a view illustrating an example of storing a plurality of images displayed on a specific area according to an embodiment of the present invention.

In storing the plurality of images, the first electronic device 300 may number images stored while the video conference is performed in order of storage as shown in (a) of FIG. 7 and may store the images that respectively correspond to the numbers. For example, it can be seen from (a) of FIG. 7 that the first image I1 is stored Nth and the second image I2 is stored N+1th and that the first image I1 corresponds to a value “N” and the second image I2 corresponds to a value “N+1”.

In storing the plurality of images, the first electronic device 300, as shown in (b) of FIG. 7, may obtain time information on times when the respective images are stored on a time line that starts counting when a video conference begins and may store the obtained time information so that the information corresponds to the respective images. For example, it can be seen from (b) of FIG. 7 that the first image I1 is stored one minute and two seconds after the video conference has begun, and the second image I2 is stored two minutes and ten seconds after the video conference has begun and that the first image I1 corresponds to time information of “1 minute and 2 seconds” and the second image I2 corresponds to time information of “2 minute and 10 seconds”.

While performing steps S120 to S150, the first electronic device 300 may continue to display the video image received from the second electronic device 200 on the display unit 151. While continuously performing steps S100 to S150, the first electronic device 300 may receive a predetermined request (S160).

The second user U2 may want to review the presentation material that has been just explained by the user U1 while viewing the video conference hosted by the first user U1. In this case, the second user U2 may input a predetermined request to the first electronic device 300. Alternatively, the second user U2 may input the predetermined request to another electronic device (not shown) wirelessly or wiredly connected to the first electronic device 300, and the other electronic device may transfer the predetermined request and/or the fact that the predetermined request has been generated to the first electronic device 300. Hereinafter, unless stated otherwise, it is assumed that the second user U2 directly inputs the predetermined request to the first electronic device 300.

The predetermined request may be input by various methods.

FIGS. 8 to 10 are views illustrating methods of obtaining a predetermined request according to embodiments of the present invention.

Referring to FIG. 8, the second user U2 makes a particular gesture (of moving his right arm from right to left). The first electronic device 300 may previously have the specific gesture shown in FIG. 8 correspond to the predetermined request, and when recognizing the second user's gesture shown in FIG. 8, may determine that the predetermined request is input.

Referring to FIG. 9, the second user U2 generates a specific voice command (for example, saying “to the previous page” or “to the Nth page”). The first electronic device 300 may previously have the voice commands shown in FIG. 9 correspond to the predetermined request, and when recognizing that the specific voice command is input from the second user U2, may determine that the predetermined request is input.

Referring to FIG. 10, the first electronic device 300 may display a specific control button CB on the display unit 151 that is displaying the movie image. The second user U2 may select the control button CB by touch and/or by input using a mouse. Receiving the input selected for the control button CB, the first electronic device 300 may determine that the predetermined request is input.

Subsequently, the first electronic device 300 may obtain an image corresponding to the received request among the images stored in the memory 160 in steps S130 and/or S150 (S170) and may display the obtained image on the display unit 151 (S180).

Obtaining the image corresponding to the predetermined request in step S170 may be performed by various methods.

For example, as described in connection with FIG. 8, in the case that the predetermined request is input by a user's gesture, when the user once makes a gesture as shown in FIG. 8, assuming that the currently displayed image is stored Nth, the N−1th stored image may be acquired in step S170, and when the user makes the gesture two times, the N−2th stored image may be obtained in step S170.

As another example, in the case that the predetermined request is input by a user's voice command (e.g., when the user says “to the previous page”) as described in connection with (a) of FIG. 9, when the user once generates the voice command (“to the previous page”) as shown in (a) of FIG. 9, the N−1th stored image may be acquired in step S170, and when the user generates the voice command two times, the N−2th stored image may be obtained in step S170.

As still another example, as described in connection with (b) of FIG. 9, in the case that the predetermined request is input by a user's voice command (e.g., when the user speaks “to the Nth page”), an image corresponding to the page targeted by the user's voice command may be obtained in step S170.

As yet still another example, in the case that the predetermined request is input by the control button CB separately displayed on the display unit 151 as described in connection with FIG. 10, when the control button CB is selected once, the N−1th stored image may be obtained in step S170 (when the currently displayed image is the Nth stored image), and when the control button CB is selected twice, the N−2th stored image may be obtained in step S170.

As described above, the obtained image may be displayed on the display unit 151. The first electronic device 300 may display the obtained image by various methods.

FIGS. 11 and 12 are views illustrating exemplary methods of displaying an obtained image according to embodiments of the present invention.

Referring to FIG. 11, (a) of FIG. 11 illustrates an example where while the first user U1 holds a video conference with the xth page of the presentation material having a plurality of pages, a video image reflecting the video conference is output through the first electronic device 300. As described above, the xth page I3 of the presentation material may be displayed on the specific area SA. In this case, when the second user U2 inputs a voice command by saying “to the first page”, the first electronic device 300 may keep displaying the video image received from the second electronic device 200 on the display unit 151 while displaying on the specific area SA the image I1 corresponding to the first page among the images stored in the memory 160 as shown in (b) of FIG. 11 instead of the presentation material currently received from the second electronic device 200.

Referring to FIG. 12, (a) of FIG. 12 illustrates the same situation as in (a) of FIG. 11. A voice command may be input by the second user U2 saying “to the second page”. On the contrary to those described in connection with FIG. 11, an image 13 corresponding to the presentation material currently received from the second electronic device 200 is continuously displayed on the specific area SA while an image 12 corresponding to the second page, which is obtained by the predetermined request, may be displayed on a region R of the display unit 151. Accordingly, the second user U2 may review the previous page of the presentation material while simultaneously continuing to view the video conference held by the first user U1.

As such, the second user may review the previous pages of the presentation material used for the video conference hosted by the first user while the video conference is in progress, thereby enabling more efficient video conference.

A method of controlling an electronic device according to an embodiment of the present invention is now described.

FIG. 13 is a view schematically illustrating an environment to which an embodiment of the present invention applies, and FIG. 14 is a flowchart illustrating a method of controlling an electronic device according to an embodiment of the present invention.

Referring to FIG. 13, third to sixth users U3, U4, U5, and U6 attend a video conference through a third electronic device 400 located at a third place, and seventh and eighth users U7 and U8 attend the video conference through a fourth electronic device 500 located at a fourth place. In this embodiment described in connection with FIG. 13, in addition to the electronic devices 200 and 300 as described in connection with FIG. 3, more electronic devices participate in the video conference. However, this is merely an example for ease of description, and the embodiments of the present invention are not limited thereto.

In the environment as illustrated in FIG. 13, the “electronic device” refers to the first electronic device 300 located at the second place, and the “(an)other electronic device(s)” refer(s) to at least one of the second electronic device 200 located at the first place, the third electronic device 400 located at the third place, and the fourth electronic device 500 located at the fourth place unless stated otherwise.

As shown in FIG. 14, the control method may include a step of receiving multimedia data obtained by the electronic devices 200, 400, and 500 (S200), a step of sensing a human voice from the received movie data at a first time point (S210), a step of identifying a first speaker corresponding to the sensed voice (S220), a step of obtaining information relating to the identified speaker (S230), and a step of storing the obtained information so that the obtained information corresponds to the first time point and storing the information (S240).

According to an embodiment, the control method may further include a step of determining whether the speaker corresponding to the human voice included in the movie data changes from the first speaker to a second speaker (S250), a step of, when the speaker changes into the second speaker, identifying the changed second speaker (S260), a step of obtaining information relating to the second speaker (S270), and a step of storing the obtained information so that the obtained information corresponds to a second time point when the speaker changes to the second speaker (S280). The information relating to the first speaker and/or second speaker may include personal information of each speaker, information on the place where each speaker is positioned during the course of the video conference, and keywords included in the speech which each speaker makes. Each step is now described in greater detail.

FIGS. 15 to 19 are views illustrating embodiments of the present invention.

The first electronic device 300 may receive multimedia data obtained by the other electronic devices 200, 400, and 500 (S200). The first electronic device 300 may receive first multimedia data obtained for the first user U1 attending the video conference at the first place, third multimedia data obtained for the third to sixth users U3, U4, U5, and U6 attending the video conference at the third place, and fourth multimedia data obtained for the seventh and eighth users U7 and U8 attending the video conference at the fourth place.

The multimedia data may include video data reflecting in real time each user attending the video conference and images surrounding the users as well as audio data reflecting in real time each user attending the video conference and sound surrounding the users.

The first electronic device 300 may output the received multimedia data through the output unit 150 in real time. For example, the video data included in the multimedia data may be displayed on the display unit 151, and the audio data included in the multimedia data may be audibly output through the sound output unit 152. According to an embodiment, while the received multimedia data is displayed, the second multimedia data obtained for the second user U2 and his surroundings directly by the camera 121 and/or microphone 122 of the first electronic device 300 may be also displayed. Hereinafter, unless stated otherwise, the “multimedia data obtained by the first electronic device 300” includes multimedia data obtained by and received from the electronic devices 200, 400, and 500 and multimedia data directly obtained by the user input unit 120 of the first electronic device 300.

The received multimedia data may be stored in the memory 160.

The first electronic device 300 may display all or at least selected one of the video data clips included in the multimedia data obtained by the first electronic device 300 on the display unit 151. Likewise, all or at least selected one of the audio data clips included in the multimedia data may also be output through the sound output unit 152. As used herein, the “multimedia data clip” may refer to part of the multimedia data, the “video data clip” may refer to part of the video data, and the “audio data clip” may refer to part of the audio data.

As described above, while outputting the multimedia data obtained by the first electronic device 300 through the output unit 150, the first electronic device 300 may sense a human voice by analyzing the audio data included in at least one multimedia data clip of the multimedia data (S210). Hereinafter, the time when the human voice is sensed in step S210 is referred to as “first time point”.

For example, as shown in FIG. 15 and (a) of FIG. 16, the fifth user U5 positioned at the third place may start to speak at a first time point. The first electronic device 300 may sense that the fifth user has started to speak at the first time point by analyzing the multimedia data (in particular, audio data) received from the electronic device 400. (b) of FIG. 16 illustrates an example where the multimedia data is output through the first electronic device 300.

Subsequently, the first electronic device 300 may identify a speaker corresponding to the sensed voice (S220). For example, the first electronic device 300 may identify which user has generated the voice among the first to eighth users U1, U2, U3, U4, U5, U6, U7, and U8 that attend the video conference. To identify the speaker corresponding to the sensed voice, the first electronic device 300 may identify which electronic device has sent the multimedia data including the voice among the electronic devices 200, 400, and 500. For example, in the case that the voice is included in the multimedia data received from the second electronic device 200 located at the first place, the first electronic device 300 may determine that the speaker corresponding to the voice is the first user U1 located at the first place.

According to an embodiment, the first electronic device 300 may analyze the video data included in the multimedia data to identify the speaker corresponding to the sensed voice. According to an embodiment, the first electronic device 300 may analyze images for respective users reflected by the video data after the voice has been sensed to determine which user has generated the voice. For example, the first electronic device 300 may determine the current speaker by recognizing each user's face and analyzing the recognized face (e.g., each user's lips). For example, as shown in (b) of FIG. 16, the first electronic device 300 may analyze each user's face and when recognizing that the fifth user's lips move, may determine that the fifth user U5 is the current speaker SP1.

The first electronic device 300 may use both the method of identifying the electronic device that has sent the multimedia data and the method of analyzing the video data included in the multimedia data to identify the speaker corresponding to the sensed voice.

It is not necessary to perform step S210 and/or S220 by the first electronic device 300. According to an embodiment, step S210 and/or S220 may be performed by the other electronic devices 200, 400, and 500. For example, each electronic device 200, 400, or 500 may determine whether a voice is included in multimedia data it receives, and if the voice is determined to be included, may analyze the video data included in the multimedia data to determine who is the speaker corresponding to the sensed voice as described above. If step S210 and/or S220 is performed by each of the electronic devices 200, 400, and 500, information on the speaker determined by the electronic devices 200, 400, and 500 may be transmitted to the first electronic device 300 along with the multimedia data. Information on the time point when the voice was sensed may be also transmitted to the first electronic device 300.

The first electronic device 300 may then obtain information relating to the identified speaker (S230). Such information may be diverse. For example, the information may include personal information on the speaker, information on the place where the speaker was positioned during the video conference, and keywords which the speaker has spoken.

The personal information on the speaker may be obtained from a database of personal information for the conference attendees which are previously stored. The database may be provided in the first electronic device 300 or at least one of the electronic devices 200, 400, and 500, or distributtedly established in the electronic devices 200, 300, 400, and 500. Or the database may be provided in a server (not shown) connected over a communication network. The personal information may include names, job positions, and divisions of, e.g., the conference attendees.

The first electronic device 300 may receive information on place where the speaker is located from the electronic devices 200, 400, and 500. Or, the first electronic device 300 may obtain the place information based on IP addresses used by the electronic devices 200, 400, and 500 for the video conference. The place information may include any information that are conceptually discerned and may distinguish one place from another, as well as any information, such as addresses for geographically specifying locations. For example, the place information may include an address, such as “xxx, Yeoksam-dong, Gangnam-gu, Seoul”, a team name, such as “Financial Team” or “IP group”, a branch name, such as “US branch of XX company” or “Chinese branch of XX company”, or a company name, such as “A corporation” or “B corporation”.

While the identified speaker makes a speech, the first electronic device 300 may analyze what the speaker says about, determine words, phrases, or sentences repeatedly spoken, and consider the repeatedly spoken words, phrases, or sentences as keywords of the speech the speaker has made. The first electronic device 300 may directly analyze audio data reflecting what the speaker has spoken or may convert the audio data into text data through an STT (Speech-To-Text) engine and analyze the converted text data.

Subsequently, the first electronic device 300 may store the obtained information so that the obtained information corresponds to the first time point (S240).

The first time point may be information specifying a time determined on a time line whose counting commences when the video conference begins. For example, if the fifth user U5 starts to speech 15 seconds after the video conference has commenced, the first time point may be the “15 seconds”.

In the example described in connection with FIGS. 15 and 16, in the case that the name of the fifth user U5, which is recognized as the first speaker SP1, is “Mike”, he belongs to “Intellectual Property Strategy Group” (or simply “IP group”), his position is “Manager”, he attends the video conference at the “Third place”, a keyword he's spoken is “Tele-presence Technology”, the first electronic device 300 may store the “Mike”, “Intellectual Property Strategy Group”, “Manager”, “Third place”, and “Tele-presence Technology” so that the information corresponds to the first time point.

Hereinafter, a set of the information (e.g., personal information of users, place information, keywords, etc.) stored corresponding to the time point when the speaker begins to speak (e.g., the first time point as in the above example) is referred to as “metadata”, and the name, division, position, place information, and keyword are referred to as fields of the metadata. In the above-described example, it has been described that the metadata for the video conference includes the personal information, place information, and keywords. However, this is merely an example, and according to an embodiment, other fields may be added in the metadata for video conference.

According to an embodiment, the control method ma continue to monitor whether the speaker corresponding to the human voice included in the multimedia data changes from the current speaker (e.g., the fifth user in the above-described example) to another user (S250). In the example illustrated in FIG. 16, the speaker is the fifth user U5. However, while the video conference is in progress, the speaker may change into the fifth user U4 as shown in FIG. 17. As such, the first electronic device 300 may keep monitoring any change of the speaker.

The first electronic device 300 may determine whether the speaker changes by analyzing the audio data included in the multimedia data received from the electronic devices 200, 400, and 500. For example, the first electronic device 300 may determine whether a human voice included in the audio data is identical to the previous voice, and if not, may determine that the speaker has changed.

If it is determined in step S250 that there is a change of the speaker, the first electronic device 300 may identify the changed speaker (S260). For convenience of description, a speaker who is identified to speak next to the first speaker SP1 that first started to speak after the video conference had commenced is referred to as a “second speaker SP2”. In the example illustrated in FIGS. 16 and 17, the first speaker SP1 is the fifth user U5, and the second speaker SP2 is the fourth user U4.

Step S260 may be performed by the same or substantially the same method as step S220. To identify a speaker corresponding to the sensed voice, the first electronic device 300 may identify which electronic device has sent the multimedia data including the voice among the electronic devices 200, 400, and 500, or may analyze the video data included in the multimedia data, or may use both identifying the video data included in the multimedia data and analyzing the video data included in the multimedia data.

Similar to step S210 and/or S220, step S250 and/or S260 is not necessarily performed b the first electronic device 300. Each of the electronic devices 200, 400, and 500 may perform step S250 and/or S260.

Subsequently, the first electronic device 300 may obtain information relating to the identified second speaker (S270) and may store the obtained information so that the information corresponds to the time point when the speaker changed (e.g., the second time) (S280). Steps S270 and step S280 may be performed identical or similar to steps S230 and S240.

Thereafter, the first electronic device 300 may repeatedly perform steps S250 to S280. Accordingly, whenever the speaker making a speech in the video conference changes, the first electronic device 300 may obtain information on the changed speaker and may store the information with the information corresponding to the time when the change occurred. For example, when the person speaking in the video conference changes from the fourth user U4 to the seventh user U7 as shown in FIG. 18 and then to the second user U2 as shown in FIG. 19, the first electronic device 300 may repeat steps S250 to S280 and may store information relating to each speaker.

Referring to FIG. 20, the first speaker, since the beginning of the video conference, is Mike who is a manager in IP group and starts to speak 15 seconds after the video conference has commenced, Mike attends the video conference at the third place, a keyword Mike comments is ‘Tele-presence Technology, the second speaker is Pitt who is an assistant in IP group and begins to speak 2 minutes and 30 seconds after the video conference has begun, Pitt attends the video conference at the third place, a keyword Pitt comments is ‘Technology Trend’. Further, the third speaker is Jack who is a chief research engineer in the first research center and starts to make a speech 6 minutes and 40 seconds after the video conference has begun. Jack attends the video conference at the fourth place, and a keyword he comments is ‘Future Prediction’. The fourth speaker is Giggs who is a senior research engineer in the second research center and starts to speak 9 minutes and 50 seconds after the video conference has commenced. Giggs attends the video conference at the second place and a keyword he comments is ‘Natural’.

As such, the first electronic device 300 may continue to monitor the speakers during the course of the video conference and may store various types of information on the speakers so that the information corresponds to the time points when the speakers begin to speak, thereby generating metadata for video conference. Further, the video conference metadata is used in various manners, thus enhancing user convenience. For example, the video conference metadata may be used to make brief proceedings for the video conference, which are to be provided to the attendees of the conference or may be used to provide a search function which allows the attendees to review the conference.

As described above, the first electronic device 300 may output the received multimedia data in real time through the output unit 150 while simultaneously generating the video conference metadata. For example, the video data included in the multimedia data may be displayed on the display unit 151, and the audio data included in the multimedia data may be audibly output through the sound output unit 152. While the received multimedia data is displayed, the second multimedia data obtained for the second user U2 and his surroundings directly by the camera 121 and/or microphone 122 of the first electronic device 300 may be output as well. The first electronic device 300 may display the whole video data included in the multimedia data obtained by the first electronic device 300 on the display unit 151 at once.

According to an embodiment, when identifying the current speaker in step S210 and/or S260, the first electronic device 300 may identify a multimedia data clip including the identified speaker among a plurality of multimedia data clips obtained and transmitted by the electronic devices 200, 400, and 500, and may output the identified multimedia data clip through the output unit 150 by a different method from output methods for the other multimedia data clips.

For instance, the first electronic device 300 may display the identified multimedia data clip (hereinafter, referred to as “multimedia data clip for speaker”) so that the speaker multimedia data clip appears larger than the other multimedia data clips (hereinafter, referred to as “multimedia data clips for listener”). For example, as shown in (b) of FIG. 16 to FIG. 19, the multimedia data clip for speaker is displayed on the first region R1 in larger size than the multimedia data clips for listener which are displayed in smaller size on the second region R2. Referring to (b) of FIG. 16, which illustrates that the speaker is the fifth user U5, the first electronic device 300 may display a screen image S3 for the multimedia data clip including the fifth user U5 (e.g., the multimedia data clip for speaker) on the first region R1 and may display screen images S1, S4, and S2 for the remaining multimedia data clips for listener on the second region R2. Referring to FIGS. 17 to 19, it can be seen that the screen images S3, S4, and S2 each including the speaker are displayed on the first region R1.

As another example, the first electronic device 300 may output both the video and audio data included in the multimedia data clip for speaker, among the plurality of multimedia data clips, through the sound output unit 152 while outputting only the video data included in the multimedia data clips for listener except for the audio data. For example, the whole video data included in the plurality of multimedia data clips may be displayed on the display unit 151 whereas only the audio data included in the multimedia data clips for listener may be selectively output through the sound output unit 152.

As still another example, the first electronic device 300 may output only the video and audio data corresponding to the multimedia data clip for speaker among the plurality of multimedia data clips through the output unit 150 while receiving the multimedia data clips for listener from the electronic devices 200, 400, and 500 and storing the received data clips in the memory 160 without outputting the stored data clips through the display unit 151 or the sound output unit 152.

Although it has been described that the control method is performed by the first electronic device 300 located at the second place, the embodiments of the present invention are not limited thereto. For example, according to an embodiment, the control method may be performed by each of the electronic device 200 located at the first place, the electronic device 400 located at the third place, and the electronic device 500 located at the fourth place.

By identifying the speaker and outputting the multimedia data clip corresponding to the speaker in a different manner than those for the other multimedia data clips, more attention can be oriented toward the user who is making a speech in the video conference, thereby enabling the video conference to proceed more efficiently.

Hereinafter, a method of controlling an electronic device according to an embodiment of the present invention is described. The metadata described above may be used to search for multimedia data for video conference at specific times. For ease of description, those described in connection with FIGS. 13 to 20 may apply to the following embodiments. However, the control method described below is not limited as conducted based on the video conference metadata described above.

FIG. 21 is a flowchart illustrating a method of controlling an electronic device according to an embodiment of the present invention.

Referring to FIG. 21, the control method may include a step of receiving a predetermined input (S300), a step of obtaining a time point according to the received input (S310), and a step of outputting multimedia data corresponding to the obtained time point (S320). According to an embodiment, after or while the video conference ends, specific time points of the multimedia data for the video conference may be searched to review the video conference, and the multimedia data corresponding to searched time points may be output. Search conditions may be defined based on a predetermined input. Each of the steps is described below in greater detail.

The first electronic device 300 may receive a predetermined input (S300). The predetermined input, which is provided to input specific time points of the multimedia data for the video conference stored in the first electronic device 300, may be received by various methods. Any method to receive the search conditions may be used for entry of the predetermined input.

FIG. 22 illustrates examples of receiving the predetermined input by various methods according to an embodiment of the present invention.

For example, referring to (a) of FIG. 22, when receiving the predetermined input, the first electronic device 300 may display input windows F1, F2, F3, F4, and F5 respectively corresponding to various fields included in the video conference metadata, and a user may perform the predetermined input through the input windows F1, F2, F3, F4, and F5 using a predetermined input method (for example, by using a keyboard), so that the first electronic device 300 may receive the predetermined input. As shown in (a) of FIG. 22, the user enters “Jack” in the input window corresponding to the “Name” field of the video conference metadata as a search condition.

As another example, the first electronic device 300 may receive the predetermined input using a touch input method. Referring to (b) of FIG. 22, the user U2 touches “Jack” (e.g., the fourth user U4) included in the screen image corresponding to the third place. Such touch enables “Jack” to be entered as a search condition.

As still another example, the first electronic device 300 may receive the predetermined input using voice recognition. Referring to (c) of FIG. 22, the user U2 generates a voice command by saying “search Jack!” Then, “Jack” is entered as a search condition.

According to an embodiment, a combination of the above-described methods may be used to receive the predetermined input. For example, if the user touches the input window F1 corresponding to the “Name” field followed by saying “search Jack!” with the screen image displayed as in (a) of FIG. 22, the first electronic device 300 may receive the predetermined input.

Subsequently, the first electronic device 300 may obtain information on time point for the received input (S310). The time information may be information for specifying a time point determined on a time line counted since the video conference begins. For example, according to an embodiment, the “time information” described in connection with FIGS. 21 to 25 may be the same or substantially the same as the time information (e.g., the first and second time points) described in connection with FIGS. 13 to 20. For example, the first electronic device 300 may determine a search condition through the user input predetermined in step S300 and may obtain the time information corresponding to the search condition.

For example, in the case that the video conference metadata is generated and stored as described in connection with FIGS. 13 to 20, the first electronic device 300 may receive a search condition corresponding to “Jack” from a user through the predetermined user input as described in connection with FIG. 22 and may extract information on the time point mapped with information corresponding to “Jack” from the video conference metadata. As such, the information to be extracted is the time information.

Accordingly, the first electronic device 300 may output the multimedia data corresponding to the obtained time point (S320). For example, the first electronic device 300 may store the multimedia data relating to the video conference and may call the stored multimedia data and output the data through 152 and/or the display unit 151 from the part corresponding to the time information obtained in step S310.

If steps S300 and S310 are performed while the video conference is on the go through the first electronic device 300, step S320 may be conducted by various methods as follows. Hereinafter, for convenience of description, the multimedia data for video conference now in progress is referred to as “current multimedia data”, and the multimedia data corresponding to the time information obtained in steps S300 and S310 is referred to as “past multimedia data”.

FIGS. 23 to 25 are views illustrating examples of outputting the current and past multimedia data according to an embodiment of the present invention.

According to an embodiment, the first electronic device 300 may display both video data included in the current multimedia data and the video data included in the past multimedia data on the display unit 151 and may output only the audio data included in the current multimedia data through the sound output unit 152 without outputting the audio data included in the past multimedia data. Referring to FIG. 23, the screen image S5 for the current multimedia data is displayed on the third region R3 of the display unit 151, and the screen image S6 of the past multimedia data is displayed on the fourth region R4 of the display unit 151. However, the audio data included only in the current multimedia data but not in the past multimedia data. In the case of that the current multimedia data includes a plurality of multimedia data clips, the screen image displayed on the third region R3 may correspond to a multimedia data clip including the speaker currently speaking in the conference among the plurality of multimedia data clips, while the other multimedia data clips may be displayed on the second region R2.

According to an embodiment, the first electronic device 300 may display the video data included in the current multimedia data on a region of the display unit 151 and may output the audio data included in the current multimedia data through the sound output unit 152. The video data included in the past multimedia data is not displayed, and text data converted from the audio data included in the past multimedia data may be displayed on another region of the display unit 151. Referring to FIG. 24, the video data included in the current multimedia data is displayed on the third region R3 of the display unit 151, and the text data converted from the audio data included in the past multimedia data is displayed on the fourth region R4 of the display unit 151.

In the case that the current multimedia data includes a plurality of multimedia data clips, the screen image displayed on the third region R3 may correspond to a multimedia data clip including the speaker currently speaking in the conference among the plurality of multimedia data clips, while the other multimedia data clips are displayed on the second region R2. Referring to FIG. 25, the video data included in the current multimedia data is displayed on the second region R2 of the display unit 151, and text data converted from the audio data included in the past multimedia data is displayed on the fifth region R5 of the display unit 151. According to an embodiment, the other multimedia data clips may be also displayed on the second region R2, and the multimedia data clip including the current speaker may be highlighted.

According to an embodiment, the first electronic device 300 may output the audio data included in the current multimedia data through the sound output unit 152 while not outputting the audio data included in the past multimedia data and may display the video data included in the past multimedia data on the display unit 151 while not displaying the video data included in the current multimedia data.

According to an embodiment, the first electronic device 300 may output the audio data included in the past multimedia data through the sound output unit 152 while not outputting the audio data included in the current multimedia data and may display the video data included in the current multimedia data on the display unit 151 while not displaying the video data included in the past multimedia data.

Alternatively, the first electronic device 300 may output the current and past multimedia data by various methods.

As such, after or while the video conference ends, specific time points of the multimedia data for the video conference may be searched to review the video conference, and the multimedia data corresponding to the searched time points may be output.

In the methods of controlling an electronic device according to the embodiments, each step is not necessary and according to an embodiment, the steps may be selectively included therein. The steps are not necessary to perform in the order described above, and according to an embodiment, a later step may be performed earlier than an earlier step.

The steps in the methods of controlling an electronic device may be performed separately or in combination thereof. According to an embodiment, steps in a method may be performed in combination with steps in another method.

The methods of controlling an electronic device may be stored in a computer readable medium in the form of codes or a program for performing the methods.

The invention has been explained above with reference to exemplary embodiments. It will be evident to those skilled in the art that various modifications may be made thereto without departing from the broader spirit and scope of the invention. Further, although the invention has been described in the context its implementation in particular environments and for particular applications, those skilled in the art will recognize that the present invention's usefulness is not limited thereto and that the invention can be beneficially utilized in any number of environments and implementations. The foregoing description and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. 

What is claimed is:
 1. An electronic device comprising: a memory; a communication unit configured to receive a video image streamed from a first electronic device; a display unit configured to display the video image; and a control unit configured to identify a specific area included in the video image, to store a first image displayed on the specific area at a first time point in the memory, and to store in the memory a second image displayed on the specific area at a second time point when a variation of an image displayed in the specific area is equal to or more than a predetermined threshold.
 2. The electronic device of claim 1, wherein the control unit is configured to store the first image so that the first image corresponds to the first time point and to store the second image so that the second image corresponds to the second time point.
 3. The electronic device of claim 1, wherein the first and second images are still images.
 4. The electronic device of claim 1, wherein the control unit is configured to determine the variation of the image displayed on the specific area based on a variation between the image displayed on the specific area and the first image stored in the memory.
 5. The electronic device of claim 1, wherein the control unit is configured to analyze the video image to identify the specific area.
 6. The electronic device of claim 1, wherein the control unit is configured to receive information on the specific area from the first electronic device through the communication unit and to identify the specific area based on the received information.
 7. An electronic device comprising: a memory; a communication unit configured to receive a video image streamed from a first electronic device; a display unit configured to display the video image; and a control unit configured to identify a specific area included in the video image, to store in the memory a still image reflecting a content displayed on the specific area whenever the content changes, so that the still image corresponds to a time point when the content changes, to determine a time point corresponding to a predetermined request when the request is received, and to call a still image corresponding to the determined time point from the memory.
 8. The electronic device of claim 7, wherein the control unit is configured to display both the streamed video image and the called still image on the display unit.
 9. The electronic device of claim 8, wherein the control unit is configured to display the still image on a second area of the video image, the second area not overlapping the specific area.
 10. The electronic device of claim 7, wherein the controller is configured to replace an image displayed on the specific area of the streamed video image by the still image and to display the replaced still image on the display unit.
 11. An electronic device comprising: a memory; a communication unit configured to receive at least one multimedia data clip from at least one second electronic device; a display unit configured to display the at least one multimedia data clip; and a control unit configured to identify a first speaker corresponding to audio data included in the at least one multimedia data clip, to obtain information corresponding to the identified first speaker, and to store the obtained information so that the obtained information corresponds to a first time point for the at least one multimedia data clip.
 12. The electronic device of claim 11, wherein the first time point is when the first speaker begins to speak.
 13. The electronic device of claim 12, wherein the control unit is configured to analyze audio data included in the at least one multimedia data clip and to determine that the first time point is when a human voice included in the audio data is sensed.
 14. The electronic device of claim 11, wherein the control unit is configured to analyze video data included in the at least one multimedia data clip to identify the first speaker.
 15. The electronic device of claim 14, wherein the control unit is configured to identify the first speaker based on a lip motion included in the video data.
 16. The electronic device of claim 11, wherein information relating to the first speaker includes at least one of personal information on the first speaker, information on a place where the first speaker is positioned, and a keyword which the first speaker speaks.
 17. An electronic device comprising: a communication unit configured to receive at least one multimedia data clip streamed from at least one second electronic device; a memory configured to store the at least one multimedia data clip; a display unit configured to display video data included in the at least one multimedia data clip; and a control unit configured to, whenever a speaker corresponding to audio data included in the at least one multimedia data clip changes, store information corresponding to the speaker so that the information corresponds to a time point when the speaker changes, to determine a time point corresponding to a predetermined input when the predetermined input is received, and to call at least part of a multimedia data clip corresponding to the determined time point from the memory.
 18. The electronic device of claim 17, wherein the control unit is configured to display both the video data included in the streamed at least one multimedia data clip and video data included in the called at least part of the multimedia data clip.
 19. The electronic device of claim 17, further comprising a sound output unit, wherein the control unit is configured to output through the sound output unit at least one of audio data included in the streamed at least one multimedia data clip and audio data included in the called at least part of the multimedia data clip.
 20. The electronic device of claim 17, wherein the control unit is configured to display both the video data included in the streamed at least one multimedia data clip and text data corresponding to audio data included in the called at least part of the multimedia data clip. 